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REMEMBERING LEO BREIMAN 1 

By Adele Cutler 

Utah State University 

Leo Breiman was a highly creative, influential researcher with a 
down-to-earth personal style and an insistence on working on impor- 
tant real world problems and producing useful solutions. This paper 
is a short review of Breiman's extensive contributions to the field of 
applied statistics. 

1. Introduction. How many theoretical probabilists walk away from a 
tenured faculty position at a top university and set out to make their living 
as consultants? How many applied consultants get hired into senior faculty 
positions in first-rate research universities? How many professors with a fine 
reputation in their field, establish an equally fine reputation in a different 
field, after retirement? Leo Breiman did all of these things and more. He 
was an inspiring speaker and a convincing writer, doing both with seemingly 
boundless enthusiasm, in an unpretentious, forthright manner that he called 
his "casual, homespun way." He was intelligent and thought deeply about 
research. But there are a number of bright, talented statisticians. What 
made Breiman different? For one thing, he was willing to take risks. By 
and large, statisticians are not great risk-takers. We tend not to stray too 
far from what we know, tend not to tackle problems for which we have 
no tools, tend to adopt or adapt existing ideas instead of coming up with 
completely new ones. Linked to this willingness to take risks was Breiman's 
unusual creativity. It was not a wild, off-the-wall creativity — it was grounded 
in a sound knowledge of theoretical principles and directed by an intuition 
gained by working intensively with data, along with a generous dose of 
common sense. Breiman was driven by challenging and important real-data 
problems that people cared about. He didn't spend time publishing things 
just because he could, filling the gaps just because they were there. Lastly, 
he was tenacious. He would not give up on a problem until he, or someone 
else, got to the bottom of what was going on. 
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Some of Breiman's ideas have advanced the field in and of themselves (e.g., 
bagging, random forests) while others have contributed more indirectly (e.g., 
Breiman's nonnegative garrote [Breiman (1995a)] inspired the lasso [Tibshi- 
rani (1996)]). Although his joint work tree-based methods [Breiman et al. 
(1984)] was arguably his most important contribution to science, he viewed 
random forests as the culmination of his work. I consider myself privileged 
to have been able to work with Leo Breiman for almost 20 years, as his stu- 
dent, collaborator and friend, and I'm honored to have been asked to write 
this review of his contributions to applied statistics. I have divided the paper 
into roughly chronological sections, but these have considerable overlap and 
are intended to be organizational rather than definitive. I kept biographical 
details to a minimum; those interested in a biography are referred to Olshen 
(2001). I do not feel qualified to discuss Breiman's work on the 1991 Census 
adjustment [Breiman (1994)] and have omitted a few other isolated pieces of 
work such as Breiman, Tsur and Zemel (1993); Breiman, Meisel and Purcell 
(1977); Breiman (1999b); Breiman and Cutler (1993). 

2. Early work. Breiman was born in New York City in 1928 and edu- 
cated in California, receiving his Ph.D. in mathematics from UC Berkeley in 
1954. He earned tenure as a probabilist in the UCLA Mathematics Depart- 
ment but soon after, he "got tired of doing theory and wanted something 
that would be more exciting" (personal communication) so he resigned. At 
this time, Breiman was already interested in classification, co-authoring a 
paper on the convergence properties of a "Learning Algorithm" [Breiman 
and Wurtele (1964)]. Curiously, the paper had only two references, one of 
which was to some early work by Seymour Papert, who was later to become 
one of the pioneers of artificial intelligence and co-author of an influential 
(and controversial) book on perceptrons [Minsky and Papert (1969)]. 

After resigning, the first thing Breiman did was to write his probability 
book [Breiman (1968)] and then, with no formal statistical training, he pro- 
ceeded to spend the next 13 years as a consultant. As well as some work 
in transportation, he worked for William Meisel's division of Technology 
Services Corporation, doing environmental studies and unclassified defense 
work. It's difficult to imagine making such a transition today, but one can 
speculate that it was in part, because he did not have a background in 
applied statistics that Breiman was so successful at consulting. Certainly 
the prediction problems on which he worked, some of which are mentioned 
in Breiman (1984) and Section 3 of Breiman (2001c), would have been a 
challenge for the tools and computers of the time. In Breiman (2001c), he 
acknowledges Meisel for helping him "make the transition from probability 
theory to algorithms." 



REMEMBERING LEO BREIMAN 



3 



3. Classification and regression trees. One of the early problems Breiman 
worked on as a consultant was to classify ship types from the peaks of radar 
profiles. The observations had different numbers of peaks and the number 
of peaks and their locations depended on the angle the ship made with the 
radar. After "a lot of head-scratching and a lot of time just thinking" the idea 
of a classification tree came to him "out of the blue." After this, Meisel's 
research team began using trees regularly. Charles Stone was brought on 
board, became interested in trees, and worked with Breiman to improve 
accuracy. In the early to mid-1970s, Breiman and Stone came up with the 
breakthrough idea of using cross validation to prune large trees. 

It's difficult to obtain published work from Breiman's consulting years, 
but by 1976, Breiman and Meisel published an early version of regression 
trees [Breiman and Meisel (1976)] which broke down the data space into 
regions and fitted a linear regression in each region. Regions were split using 
a randomly oriented plane and an F-ratio was used to determine if the split 
had significantly reduced the residual sum of squares; if not, another random 
split was tried. In retrospect, the idea of using randomly chosen splits seems 
a good 20 years ahead of its time. The statement "many typical data analytic 
problems are characterized by their high dimensionality. . . and the lack of 
any a priori identification of a natural and appropriate family of regression 
functions" [Breiman and Meisel (1976)] was a clear indicator of Breiman's 
future research directions. 

In 1976, Breiman met Jerome Friedman, a high-energy particle physicist, 
and soon Friedman was also working as a consultant for TSI. Both Fried- 
man and Stone had connections to Richard Olshen, and the four started to 
collaborate. Apparently, they decided to publish their research as a book 
because they believed the work was unlikely to be published in the standard 
statistical journals. 

In 1980, Stone and Breiman joined the UC Berkeley Statistics Depart- 
ment, and the group experimented with different splitting criteria, refined 
the cross-validation approach, and came up with the idea of surrogate splits. 
Several things set this work apart from other early work on trees. First, 
they did painstaking experiments. As they report in Breiman and Friedman 
(1988), "In the course of the research that led to CART, almost two years 
were spent experimenting with different stopping rules. Each stopping rule 
was tested on hundreds of simulated data sets with different structures." 
Second, they kept applications in the foreground of their work, due in part 
to Breiman's years as a consultant. Third, they had what Breiman referred 
to as "some beautiful and complex theory." The book, priced low to make 
it accessible, was published in 1984 [Breiman et al. (1984)]. 

4. ACE and additive models. I once heard Charles Stone express regret 
that the CART group had not written a follow up book of "things we tried 
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that didn't work." I expect such a book could have prevented a number of 
researchers from reinventing the wheel, but few would want to read such a 
book, much less write it. In fact, after completing Breiman et al. (1984), 
Breiman admitted to being "completely fed up with thinking about trees." 
Breiman and Friedman continued to talk, because both were interested in 
high-dimensional data analysis, and soon they came up with the Alternating 
Conditional Expectations (ACE) algorithm [Breiman and Friedman (1985)]. 
For predictor variables X\, . . . , X p and response Y, ACE defines 4>* 
and 9* to minimize 



E 



W-J>;(X,- 



under the constraint Var((9) = 1. Estimates <j)\, and 8* were obtained 

using an iterative optimization procedure involving (nonlinear) smoothing 
to estimate each of the transformations while holding the others fixed. This 
was an application of the Gauss-Seidel algorithm of numerical linear algebra. 
A simpler version, taking 9 as the identity, is the familiar "backfitting" al- 
gorithm [Hastie and Tibshirani (1986), Buja, Hastie and Tibshirani (1989)]. 

ACE was the first in a series of papers Breiman wrote on smoothing 
and additive models. Breiman and Peters (1992) compared four scatter- 
plot smoothers using an extensive simulation. Building on the spline models 
used in Breiman and Peters (1992), Breiman's II method [Breiman (1991)], 
with the colorful acronym "PIMPLE," fit additive models of products of 
(univariate) cubic splines. Hinging hyperplanes [Breiman (1993b)] fit an 
additive function of hyperplanes, continuously joined along a line called a 
"hinge." According to Breiman (1993a), while ACE provided the "first avail- 
able method for fitting additive models to data," it had some difficulties. For 
small sample sizes, the results were "noisy and erratic." The nonlinearity 
of the smoother combined with the iterative algorithm led to results that 
were "difficult to analyze and sometimes mildly unstable." So Breiman went 
back to the drawing board, adapting a spline-based method using stepwise 
deletion of knots [Smith (1982)], resulting in Breiman (1993a). This paper 
contains early thoughts on using cross-validation to measure instability: "If 
transformations change drastically when one or a few cases are removed, 
then they do not reflect an overall pattern in the data." These early ideas 
of instability ultimately led to some of Breiman's most influential work. 

5. Multivariate techniques. While all Breiman's work was multivariate, 
some was more clearly affiliated with traditional multivariate techniques. 
In 1984, Breiman and Ihaka released a technical report [Breiman and Ihaka 
(1984)] describing a nonlinear, smoothing-based version of discriminant anal- 
ysis. The work was never published but it motivated the work on "Flexible 
Discriminant Analysis" by Hastie, Tibshirani and Buja (1994). 
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In his consulting days, one of the problems Breiman studied was next-day 
ozone prediction. One of his ideas was to represent each day as a mixture 
of "extreme" or "archetypal" days. For example, an archetypal sunny day 
would be as sunny as possible, an archetypal rainy day would have as much 
rain as possible, an archetypal foggy day would have fog for as long as 
possible, and so on. Most days would not be archetypal — they would fall in 
between the archetypes, resembling each to a greater or lesser extent. For 
data {xi,i = 1, . . . , N}, the problem was to find archetypal points {zk,k = 
K} to minimize 



subject to the constraints > 0, = 1> while a l so constraining the 

Zfc's to fall on or inside the convex hull of the data. The problem can be 
solved using an alternating least squares algorithm [Cutler and Breiman 
(1994)]. Archetypes have been used as an alternative to cluster analysis or 
principal components in numerous disciplines. 

The final method in this section is a paper on multivariate regression, 
whimsically called "curds and whey" [Breiman and Friedman (1997)]. To 
predict correlated responses, Breiman and Friedman considered predicting 
each response by a linear combination of the ordinary least squares (OLS) 
predictors rather than the OLS predictors themselves. The method worked 
by transforming into canonical coordinates, shrinking, then transforming 
back. Cross-validation was used to choose the amount of shrinkage. 

6. Subset selection in linear regression. Breiman had a longstanding in- 
terest in submodel selection in linear regression, revealing itself in Breiman 
and Meisel (1976), which used an early version of a regression tree to esti- 
mate the "intrinsic variability" of the data, with the goal of effectively rank- 
ing the predictive capabilities of subsets of independent variables. Breiman 
and Freedman (1983) looked at determining the optimal number of regres- 
sors to minimize mean squared prediction error. Again, using prediction er- 
ror as the gold standard, Breiman (1992) and Breiman and Spector (1992) 
contained careful and thorough simulation studies for the X-fixed and X- 
random situations. 

As Efron (2001) mentioned, Leo's "openness to new ideas whatever their 
source" was an attractive feature of his work. One example of this openness 
was that in the early 1990s, Leo got interested in neural nets and started par- 
ticipating in the Neural Information Processing Systems (NIPS) conference 
and workshops. Neural nets were not really a new idea, but they were en- 
joying new popularity among computer scientists, physicists and engineers, 
who in Leo's view were turning out "thousands of interesting research papers 
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related to applications and methodology" [Breiman (2001c)]. To this active 
community, Leo brought his considerable statistical background, experience 
with trees and subset selection, and perspective from years of dealing with 
real data and thinking about how to do it better. This led to Leo's most 
productive years, in part facilitated by his retirement from the UC Berkeley 
Statistics Department in early 1993, about which he said, "So far retire- 
ment has meant that I've got more time to spend on research" (personal 
communication) . 

The first work to appear from this period, stacking [Breiman (1996a)], 
was stimulated by Wolpert (1992) and first appeared as a technical re- 
port in 1992. In Breiman (1996a), he said, "In past statistical work, all 
the focus has been on selecting the "best" single model from a class of 
models. We may need to shift our thinking to the possibility of forming 
combinations of models." In the case of stacking, this was a linear com- 
bination of predictors. Each predictor was based on what Wolpert called 
the "level 1 data" [Wolpert (1992)]. Breiman (1996a) considered a family 
of models indexed by k = 1, . . . , K . For example, k might be the number 
of variables in a subset selection method or k might index a collection 
of shrinkage parameters k = 1, . . . ,K} for ridge regression. For data 
{xi n , . . . , x pn ,y n , n = 1, . . . , iV}, each of the K predictors were fit to the data 
with observation n omitted (leave-one-out cross validation) to give k pre- 
dictions of y n , namely Zk n , k= 1, . . . , K, which were the "level 1 data." The 
"stacked" predictor was Ylk a k z kn where > 0, k = 1, . . . ,K, were chosen 
to minimize 



Breiman considered stacked subsets and stacked ridge regressions and con- 
cluded that both were better than the existing method (choosing a single 
model by cross-validation). However, stacking improved subsets more than it 
improved ridge, which Breiman suggested was due to the greater instability 
of subset selection. 

Building on stacking [Breiman (1996a)] and using some of his experiences 
from Breiman (1992) and Breiman and Spector (1992), Breiman introduced 
the nonnegative garrote [Breiman (1995a)]. For data as before and original 
OLS coefficients {/3fe}, the nonnegative garrote chose {c/;} to minimize 



subject to the constraints > and Ylk c k — s - This was a much simpler 
idea than stacking because it did not use Wolpert's "level 1 data" [Wolpert 
(1992)] and k ranged over the predictor variables instead of denoting the size 
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of a subset or the value of a shrinkage parameter. Breiman found [Breiman 
(1995a)] that the garrote had consistently lower prediction error than sub- 
set selection, and sometimes better than ridge regression. Breiman's ideas 
about instability, first mentioned in Breiman (1993a), led him to character- 
ize of ridge regression as stable, subset selection unstable, and the garrote 
intermediate. Breiman remarked that "the more unstable a procedure is, the 
more difficult it is to accurately estimate PE (prediction error)" and spec- 
ulated about finding a "numerical measure of stability." Biihlmann and Yu 
(2006) showed some interesting results for the garrote in a boosting context. 
However, the largest impact of the garrote was that it inspired the lasso 
[Tibshirani (1996)], which is currently the method of choice, in part because 
of garrote's dependence on {/3fc}. 

Breiman's notions of stability were further explored in Breiman (1996b). 
He compared ridge regression, subset selection and two versions of garrote 
and stated, "Unstable procedures can be stabilized by perturbing the data, 
getting a new predictor sequence. . . and then averaging over many such pre- 
dictor sequences." The types of perturbation he considerd are leave-one-out 
cross-validation, leave-ten-out cross-validation and adding random noise to 
the response variable. He stated [Breiman (1996b)] "we do not know yet 
what the best stabilization method is." 

7. Bagging. Breiman released an early version of Breiman (1996b) in 
June 1994, but by September of the same year he released yet another tech- 
nical report in which he had already resolved some of the questions raised in 
Breiman (1996b). He called the report "Bagging Predictors" and it was to 
be published as Breiman (1996c). The name comes from "bootstrap aggre- 
gating" because in bagging, the data were perturbed by taking bootstrap 
samples and the resulting predictors were averaged (aggregated) to give 
the "bagged estimate." The classification version aggregates by voting the 
predictors. In November 1994, Breiman presented bagging as part of a Tu- 
torial at the NIPS conference, where it was immediately embraced by the 
neural net community. According to Google Scholar, citations of Breiman 
(1996c) already exceed 6300, slightly higher than Efron's 1979 bootstrap pa- 
per [Efron (1979)]. The simplicity and elegance of bagging made it appealing 
in a community where new ideas tended to be technically complex. 

In bagging, each predictor was fit to a bootstrap sample, so roughly 37% 
of the observations were not included in the fit ( "out-of-bag" ) . In an un- 
published technical report Breiman (1997b) described how to use these for 
estimating node probabilities and generalization error. 

Although bagging trees improved the accuracy of trees, Breiman liked the 
simple, understandable structure of individual trees and was not ready to 
give up on them. Noting that trees have "the disadvantage that the splits get 
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noisier as you go down" (personal communication), he worked with Nong 
Shang [Shang and Breiman (1996)] to try to improve the stability of trees 
by estimating the joint density of the data and basing the splits on this 
density estimate instead of directly on the data. One of the problems of this 
method was that density estimates depended on numerous parameters and 
Breiman referred to it later [Breiman (1998b)] as a "complex and unwieldy 
procedure." Another attempt, described in Breiman (1998b), was to generate 
new "pseudo-data" by randomly choosing an existing data point and moving 
its predictor variables a small step towards a second randomly-chosen data 
point. The new predictor values, together with the response for the original 
data point, gave the pseudo-data. The step size was chosen to be uniform 
on the interval (0, d) where d was a parameter of the method. Although the 
results appeared promising, the method did not give improvements on large 
datasets and the paper was never published. 

Breiman tried to improve upon bagging in a number of other ways. His 
"iterated" or "adaptive" bagging [Breiman (2001b)] was designed to reduce 
the bias of bagged regressions by successively altering the output values 
using the out-of-bag data. Naturally, this biases the out-of-bag generalization 
error estimates, but Breiman found that for the purpose of bias reduction it 
worked well [Breiman (2001b)]. In a similar vein, Breiman (2000a) provided 
an alternative to bagging by combining predictors fit to data for which only 
the output variables have been perturbed. It's not clear whether these ideas 
would have endured because Breiman did not release code and they were 
discarded once he discovered random forests [Breiman (2001a)]. 

8. Boosting and arcing. While Breiman developed bagging, Freund and 
Schapire worked on AdaBoost [Schapire (1990), Freund (1995), Freund and 
Schapire (1996)]. Breiman referred to the AdaBoost algorithm as "the most 
accurate general purpose classification algorithm available" [Breiman (2004b)] 
Like bagging, AdaBoost combined a sequence of predictors. Unlike bagging, 
each predictor was fit to a sample from the training data, with larger sam- 
pling weights given to observations that had been misclassified by earlier 
predictors in the sequence. The predictions were combined using perfor- 
mance weights. In a personal communication, Breiman wrote, "Some of my 
latest efforts are to understand Adaboost better. Its really a strange al- 
gorithm with unexpected behavior. Its become like searching for the Holy 
Grail!!" In his quest, Breiman produced a series of papers [Breiman (1997a, 
1998a, 1998c, 1999a, 2000b, 2004b)]. He noted in Breiman (1998a) that if 
AdaBoost "is run far past the point at which the training set error is zero, it 
gives better performance than bagging on a number of real data sets." This 
was a great mystery and Breiman was determined to get to the bottom of it. 
In Breiman (1998a), Breiman constructed a more general class of algorithms 
"arcing," of which AdaBoost, ("arc-fs") was a special case. One contribution 
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of Breiman (1998a) was that Breiman removed the randomness of boosting 
by using a weighted version of the classifier instead of sampling weights. 
Focusing on bias and variance, he concluded that "Arcing does better than 
bagging because it does better at variance reduction" [Breiman (1998a)], 
but Schapire et al. (1998) gave examples in which the main effect of Ad- 
aBoost was to reduce bias and proposed their own reasons for why boosting 
worked so well. Breiman thought the explanation was incomplete [Breiman 
(1999a)]. 

Breiman's work on half and half bagging [Breiman (1998c)] was stimu- 
lated by one of the referees of Breiman (1998a), who commented that the 
probability weight at a given step was equally divided between the points 
misclassified, and those correctly classified, at the previous step. In Breiman 
(1998c) Breiman divided the data into two parts, one containing "easy" 
points, the other "hard" points, based on previous classifiers in the sequence. 
He randomly sampled an equal number of cases from both groups and fitted 
a classification tree. For the first time, the tree was grown deep (one example 
per terminal node), which he later carried over to random forests [Breiman 
(2001a)]. 

In Breiman (1997a), he showed that AdaBoost is a "down-the-gradient" 
method for minimizing an exponential function of the error. Independently, 
Hastie, Tibshirani and Friedman (2000) presented "The Statistical View of 
Boosting." 

About his "Infinity Theory" paper [Breiman (2000b)], Breiman stated 
in August 2000: "I've been compulsively working on a theory paper about 
tree ensembles which I got sick and tired of but knew that if I didn't keep 
going. . . it would never get finished." The paper was released as a technical 
report, cited by Zhang (2004) and Buhlmann and Yu (2003), among others. 
A later version was published as Breiman (2004b) and in this paper Breiman 
showed that the population version of AdaBoost was Bayes-consistent. In the 
meantime, several publications, including Hastie, Tibshirani and Friedman 
(2000), suggested that AdaBoost could overfit in the limit and Jiang (2004) 
showed that in the finite sample case, AdaBoost was only Bayes-consistent 
if it was regularized. 

9. Random forests. In the light of boosting, Breiman spent a lot of time 
trying to improve individual trees [Shang and Breiman (1996), Breiman 
(1998b)] and bagged trees [Breiman (2000a, 2001b)]. He also worked very 
hard to understand what was going on with boosting [Breiman (1997a, 
1998a, 1998c, 1999a, 2000b, 2004b)]. However, he never seriously produced 
a boosting algorithm for practical use, and I believe the reason was that he 
wanted a method that could give meaningful results for data analysis, not 
just prediction, and he didn't think he could get this by combining depen- 
dent predictors. The culmination of his work on bagging and how to improve 
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it, and his work trying to understand boosting, was a method Breiman called 
"random forests" (RF) [Breiman (2001a)]. Random forests fit trees to inde- 
pendent bootstrap samples from the data. The trees were grown large (for 
classification) and at each node independently, m predictors were chosen 
out of the p available, and the best possible split on these m predictors was 
used. As a default for classification, Breiman settled on choosing m = ^Jp. 
In RF we see a synthesis of the bagging ideas (bootstrapping) , along with 
ideas that came from boosting (growing large trees), and Breiman's un- 
derstanding of how to increase instability (randomly choosing predictors at 
each node) to get more accurate aggregate predictions. Once he came up 
with RF, Breiman stopped working on new algorithms and started work 
on how to get the most out of the RF results. He developed measures of 
variable importance and proximities between observations. Together, we de- 
veloped a program for visualizing and interpreting RF results (available from 
http : //www. math. usu. edu/~adele/f orests/cc_graphics .htm). Chao Chen 
and Andy Liaw worked with Breiman on ways to adjust RF for unbalanced 
classes [Chao, Liaw and Breiman (2004)]. Vivian Ng worked with him on 
detecting interactions [Ng and Breiman (2005)]. In his last technical report, 
Breiman showed consistency for a simple version of RF [Breiman (2004a)] . 
But the work on RF did not stop when Breiman died. Several extensions have 
been published; for example, Diaz-Uriarte and Alvarez de Andres (2006) 
developed a variable selection procedure, Meinshausen (2006) introduced 
quantile regression forests, and Hothorn et al. (2006), Ishwaran et al. (2008) 
considered forests for survival analysis. Although theory is still thin on the 
ground, Lin and Jeon (2006) showed that RF behaves like a nearest neighbor 
classifier with an adaptive metric and Biau, Devroye and Lugosi made some 
progress on consistency in a paper dedicated to Breiman's memory [Biau, 
Devroye and Lugosi (2008)]. Numerous applied articles have appeared and 
even a number of YouTube videos. I believe Breiman would be truly de- 
lighted at the popularity of the method. 

10. Software. Leo developed his own code, invariably in fortran. I col- 
laborated with him on the random forests fortran code and documentation 
http : //www . math . usu . edu/~adele/f orests/cc_home . htm. Andy Liaw and 
Matt Wiener developed an interface to R [Liaw and Wiener (2002)]. Al- 
though Leo supported the R release and admired the free-software philoso- 
phy of R, he regarded R as a tool for "Ph.D. statisticians" and he wanted 
his code to also be available with an easy to use graphical user interface 
(GUI). GUI-driven software for classification and regression trees and ran- 
dom forests is available from Salford Systems. Versions of trees, random 
forests and archetypes are available in R (packages rpart, randomForests 
[Liaw and Wiener (2002)], and archetypes [Eugster and Leisch (2009)]). 
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11. Textbooks. In addition to his papers, Breiman wrote three textbooks 
[Breiman (1968, 1969, 1973)], the first of which is in SIAM's "Classics of 
Mathematics" series. Perhaps even more impressive is the fact that other 
scholars are now writing texts that refer extensively to Breiman's work, 
including trees, bagging and random forests [see Berk (2008), Hastie, Tib- 
shirani and Friedman (2009) and Izenman (2008)]. 

12. Philosophy. Breiman passionately believed that statistics should be 
motivated by problems in data analysis. Comments such as 

If statistics is an applied field and not a minor branch of mathematics, then 
more than 99% of the published papers are useless exercises. [Breiman (1995b)] 

show how deeply he believed that statistics needed a change of direction. 
When he heard that Breiman (1998a) was to be published with discussion 
in The Annals of Statistics, he commented that "it would sure liven things 
up. . . maybe get some blood moving in the statistical main stream of asymp- 
topia" (personal communication). 

Although it is not widely cited, I believe Breiman's "Two Cultures" paper 
[Breiman (2001c)] is one of his most widely read, at least among statisticians. 
The paper contained Breiman's views about where the field was going and 
what needed to be done. To conclude, he said: 

The roots of statistics, as in science, lie in working with data and checking 
theory against data. I hope in this century our field will return to its roots. 
There are signs that this hope is not illusory. Over the last ten years, there has 
been a noticeable move toward statistical work on real world problems and 
reaching out by statisticians toward collaborative work with other disciplines. 
I believe this trend will continue and, in fact, has to continue if we are to 
survive as an energetic and creative field. [Breiman (2001c)] 
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